AITopics | Lake County

Collaborating Authors

Lake County

A Targeted Learning Framework for Estimating Restricted Mean Survival Time Difference using Pseudo-observations

arXiv.org Machine LearningJan-21-2026

A targeted learning (TL) framework is developed to estimate the difference in the restricted mean survival time (RMST) for a clinical trial with time-to-event outcomes. The approach starts by defining the target estimand as the RMST difference between investigational and control treatments. Next, an efficient estimation method is introduced: a targeted minimum loss estimator (TMLE) utilizing pseudo-observations. Moreover, a version of the copy reference (CR) approach is developed to perform a sensitivity analysis for right-censoring. The proposed TL framework is demonstrated using a real data application.

artificial intelligence, machine learning, sensitivity analysis, (14 more...)

arXiv.org Machine Learning

2601.06296

Country: North America > United States > Illinois > Lake County (0.14)

Genre: Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Immunology (0.95)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.90)

Add feedback

STREETS: A Novel Camera Network Dataset for Traffic Flow

Corey Snyder, Minh Do

Neural Information Processing SystemsAug-20-2025, 08:34:27 GMT

dataset, graph, vehicle, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Illinois > Cook County > Chicago (0.24)
North America > United States > New York (0.04)
North America > United States > Minnesota (0.04)
(9 more...)

Industry:

Transportation > Ground > Road (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
Information Technology (0.93)
(2 more...)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(3 more...)

Add feedback

HalluVerse25: Fine-grained Multilingual Benchmark Dataset for LLM Hallucinations

Abdaljalil, Samir, Kurban, Hasan, Serpedin, Erchin

arXiv.org Artificial IntelligenceMar-10-2025

Large Language Models (LLMs) are increasingly used in various contexts, yet remain prone to generating non-factual content, commonly referred to as "hallucinations". The literature categorizes hallucinations into several types, including entity-level, relation-level, and sentence-level hallucinations. However, existing hallucination datasets often fail to capture fine-grained hallucinations in multilingual settings. In this work, we introduce HalluVerse25, a multilingual LLM hallucination dataset that categorizes fine-grained hallucinations in English, Arabic, and Turkish. Our dataset construction pipeline uses an LLM to inject hallucinations into factual biographical sentences, followed by a rigorous human annotation process to ensure data quality. We evaluate several LLMs on HalluVerse25, providing valuable insights into how proprietary models perform in detecting LLM-generated hallucinations across different contexts.

dataset, edited sentence, hallucination, (14 more...)

arXiv.org Artificial Intelligence

2503.07833

Country:

North America > United States > Texas > Brazos County > College Station (0.14)
Asia > Japan (0.14)
Europe > Austria > Vienna (0.14)
(15 more...)

Genre: Personal > Honors (0.95)

Industry: Government > Regional Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Achieving Operational Universality through a Turing Complete Chemputer

Gahler, Daniel, Thomas, Dean, Lach, Slawomir, Cronin, Leroy

arXiv.org Artificial IntelligenceFeb-4-2025

The most fundamental abstraction underlying all modern computers is the Turing Machine, that is if any modern computer can simulate a Turing Machine, an equivalence which is called Turing completeness, it is theoretically possible to achieve any task that can be algorithmically described by executing a series of discrete unit operations. In chemistry, the ability to program chemical processes is demanding because it is hard to ensure that the process can be understood at a high level of abstraction, and then reduced to practice. Herein we exploit the concept of Turing completeness applied to robotic platforms for chemistry that can be used to synthesise complex molecules through unit operations that execute chemical processes using a chemically-aware programming language, XDL. We leverage the concept of computability by computers to synthesizability of chemical compounds by automated synthesis machines. The results of an interactive demonstration of Turing completeness using the colour gamut and conditional logic are presented and examples of chemical use-cases are discussed. Over 16.7 million combinations of Red, Green, Blue (RGB) colour space were binned into 5 discrete values and measured over 10 regions of interest (ROIs), affording 78 million possible states per step and served as a proxy for conceptual, chemical space exploration. This formal description establishes a formal framework in future chemical programming languages to ensure complex logic operations are expressed and executed correctly, with the possibility of error correction, in the automated and autonomous pursuit of increasingly complex molecules.

artificial intelligence, programming language, turing machine, (15 more...)

arXiv.org Artificial Intelligence

2502.02872

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
North America > United States > New York (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
(4 more...)

Genre: Research Report (0.40)

Industry: Materials > Chemicals (0.67)

Technology:

Information Technology > Software > Programming Languages (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)

Add feedback

Optimal Survey Design for Private Mean Estimation

Chen, Yu-Wei, Pasupathy, Raghu, Awan, Jordan A.

arXiv.org Machine LearningJan-29-2025

This work identifies the first privacy-aware stratified sampling scheme that minimizes the variance for general private mean estimation under the Laplace, Discrete Laplace (DLap) and Truncated-Uniform-Laplace (TuLap) mechanisms within the framework of differential privacy (DP). We view stratified sampling as a subsampling operation, which amplifies the privacy guarantee; however, to have the same final privacy guarantee for each group, different nominal privacy budgets need to be used depending on the subsampling rate. Ignoring the effect of DP, traditional stratified sampling strategies risk significant variance inflation. We phrase our optimal survey design as an optimization problem, where we determine the optimal subsampling sizes for each group with the goal of minimizing the variance of the resulting estimator. We establish strong convexity of the variance objective, propose an efficient algorithm to identify the integer-optimal design, and offer insights on the structure of the optimal design.

artificial intelligence, machine learning, mechanism, (14 more...)

arXiv.org Machine Learning

2501.18121

Country:

Asia > Middle East > Jordan (0.05)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Indiana > Tippecanoe County > West Lafayette (0.04)
(4 more...)

Genre: Research Report (0.82)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Letters, Colors, and Words: Constructing the Ideal Building Blocks Set

Salazar, Ricardo, Jamshidi, Shahrzad

arXiv.org Artificial IntelligenceJan-26-2025

Define a building blocks set to be a collection of n cubes (each with six sides) where each side is assigned one letter and one color from a palette of m colors. We propose a novel problem of assigning letters and colors to each face so as to maximize the number of words one can spell from a chosen dataset that are either mono words, all letters have the same color, or rainbow words, all letters have unique colors. We explore this problem considering a chosen set of English words, up to six letters long, from a typical vocabulary of a US American 14 year old and explore the problem when n = 6 and m = 6, with the added restriction that each color appears exactly once on the cube. The problem is intractable, as the size of the solution space makes a brute force approach computationally infeasible. Therefore we aim to solve this problem using random search, simulated annealing, two distinct tree search approaches (greedy and best-first), and a genetic algorithm. To address this, we explore a range of optimization techniques: random search, simulated annealing, two distinct tree search methods (greedy and best-first), and a genetic algorithm. Additionally, we attempted to implement a reinforcement learning approach; however, the model failed to converge to viable solutions within the problem's constraints. Among these methods, the genetic algorithm delivered the best performance, achieving a total of 2846 mono and rainbow words.

artificial intelligence, machine learning, permutation, (12 more...)

arXiv.org Artificial Intelligence

2501.17188

Country: North America > United States > Illinois > Lake County > Lake Forest (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Federated Discrete Denoising Diffusion Model for Molecular Generation with OpenFL

Ta, Kevin, Foley, Patrick, Thieme, Mattson, Pandey, Abhishek, Shah, Prashant

arXiv.org Artificial IntelligenceJan-21-2025

Generating unique molecules with biochemically desired properties to serve as viable drug candidates is a difficult task that requires specialized domain expertise. In recent years, diffusion models have shown promising results in accelerating the drug design process through AI-driven molecular generation. However, training these models requires massive amounts of data, which are often isolated in proprietary silos. OpenFL is a federated learning framework that enables privacy-preserving collaborative training across these decentralized data sites. In this work, we present a federated discrete denoising diffusion model that was trained using OpenFL. The federated model achieves comparable performance with a model trained on centralized data when evaluating the uniqueness and validity of the generated molecules. This demonstrates the utility of federated learning in the drug design process. OpenFL is available at: https://github.com/securefederatedai/openfl

artificial intelligence, data mining, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2501.12523

Country:

North America > United States > Illinois > Lake County > North Chicago (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > California > Santa Clara County > Santa Clara (0.04)

Genre: Research Report (0.83)

Industry:

Information Technology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.34)

Add feedback

Retrieval, Reasoning, Re-ranking: A Context-Enriched Framework for Knowledge Graph Completion

Li, Muzhi, Yang, Cehao, Xu, Chengjin, Jiang, Xuhui, Qi, Yiyan, Guo, Jian, Leung, Ho-fung, King, Irwin

arXiv.org Artificial IntelligenceNov-12-2024

The Knowledge Graph Completion~(KGC) task aims to infer the missing entity from an incomplete triple. Existing embedding-based methods rely solely on triples in the KG, which is vulnerable to specious relation patterns and long-tail entities. On the other hand, text-based methods struggle with the semantic gap between KG triples and natural language. Apart from triples, entity contexts (e.g., labels, descriptions, aliases) also play a significant role in augmenting KGs. To address these limitations, we propose KGR3, a context-enriched framework for KGC. KGR3 is composed of three modules. Firstly, the Retrieval module gathers supporting triples from the KG, collects plausible candidate answers from a base embedding model, and retrieves context for each related entity. Then, the Reasoning module employs a large language model to generate potential answers for each query triple. Finally, the Re-ranking module combines candidate answers from the two modules mentioned above, and fine-tunes an LLM to provide the best answer. Extensive experiments on widely used datasets demonstrate that KGR3 consistently improves various KGC methods. Specifically, the best variant of KGR3 achieves absolute Hits@1 improvements of 12.3% and 5.6% on the FB15k237 and WN18RR datasets.

artificial intelligence, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2411.08165

Country:

North America > United States > New Jersey > Bergen County (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > United Kingdom > England > Leicestershire > Leicester (0.05)
(30 more...)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment > Sports > Soccer (0.95)
Government > Regional Government > North America Government > United States Government (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (0.72)

Add feedback

Characteristics of Political Misinformation Over the Past Decade

Schlicht, Erik J

arXiv.org Artificial IntelligenceNov-9-2024

Although misinformation tends to spread online, it can have serious real-world consequences. In order to develop automated tools to detect and mitigate the impact of misinformation, researchers must leverage algorithms that can adapt to the modality (text, images and video), the source, and the content of the false information. However, these characteristics tend to change dynamically across time, making it challenging to develop robust algorithms to fight misinformation spread. Therefore, this paper uses natural language processing to find common characteristics of political misinformation over a twelve year period. The results show that misinformation has increased dramatically in recent years and that it has increasingly started to be shared from sources with primary information modalities of text and images (e.g., Facebook and Instagram), although video sharing sources containing misinformation are starting to increase (e.g., TikTok). Moreover, it was discovered that statements expressing misinformation contain more negative sentiment than accurate information. However, the sentiment associated with both accurate and inaccurate information has trended downward, indicating a generally more negative tone in political statements across time. Finally, recurring misinformation categories were uncovered that occur over multiple years, which may imply that people tend to share inaccurate statements around information they fear or don't understand (Science and Medicine, Crime, Religion), impacts them directly (Policy, Election Integrity, Economic) or Public Figures who are salient in their daily lives. Together, it is hoped that these insights will assist researchers in developing algorithms that are temporally invariant and capable of detecting and mitigating misinformation across time.

artificial intelligence, misinformation, social media, (12 more...)

arXiv.org Artificial Intelligence

2411.06122

Country:

North America > United States > New York (0.04)
North America > United States > Nevada > Clark County > Las Vegas (0.04)
North America > United States > Massachusetts (0.04)
(4 more...)

Genre: Research Report > New Finding (0.89)

Industry:

Media > News (1.00)
Government > Regional Government > North America Government > United States Government (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence (1.00)

Add feedback

STREETS: A Novel Camera Network Dataset for Traffic Flow

Neural Information Processing SystemsOct-11-2024, 05:04:00 GMT

In this paper, we introduce STREETS, a novel traffic flow dataset from publicly available web cameras in the suburbs of Chicago, IL. We seek to address the limitations of existing datasets in this area. Many such datasets lack a coherent traffic network graph to describe the relationship between sensors. The datasets that do provide a graph depict traffic flow in urban population centers or highway systems and use costly sensors like induction loops. These contexts differ from that of a suburban traffic body.

novel camera network dataset, sensor, traffic flow, (1 more...)

Neural Information Processing Systems

Country:

North America > United States > Illinois > Cook County > Chicago (0.29)
North America > United States > Illinois > Lake County (0.09)

Industry: Consumer Products & Services > Travel (0.90)

Technology: Information Technology > Artificial Intelligence > Vision > Image Understanding (0.40)

Add feedback